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ABSTRACT 


The empirical model builder utilizing regression tech- 
niques frequently relies on the coefficient of determination, 
ee, to measure ‘goodness of fit'. Costing and pricing ana- 
lysts using such variable selection techniques frequently 
encounter inflated R- values. This paper examines the space 
within which the regression model operates and presents prac- 
tical optimization algorithms to help assess‘the amount of 
confidence that can be placed in R? for a particular set of 
candidate predictor variables. The algorithms describe a 
technique using linear programming to find the lowest value 


of R- possible using the given set of data. 
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i.  LNERODUCTION 


A. PROBLEM MOTIVATION 

Parametric cost estimation iS a management tool used to 
aid in the prediction of the cost of a proposed system. It 
involves predicting the cost (dependent variable) of a system 
by means of explanatory (independent) variables such as sys- 
tem characteristics or performance requirements. This proce- 
dure is based on the premise that the cost of a svstem is 
related ina quantifiable way to the system's physical and 
performance characteristics. The expression of this quanti- 
fiable relationship is in the form of an estimating equation 
derived through statistical regression analysis of historical 
cost data on systems which are, more-or-less, analogous to 
the proposed system. Since parametric cost estimates can be 
developed during the concept formulation stage of the acqui- 
Sition process before engineering plans are finalized, these 
estimates can be used by management to: 


(1) Identify possible cost/performance tradeoffs in the 
design effort. 


(2) Provide a basis for cost/effectiveness review of 
performance specifications. 


(3) Provide information useful in the ranking of 
competing alternatives. 


(4) Suggest a need for investigating new alternatives. 
Cost overruns have been prevalent in the acquisition 


process for new weapon systems making cost estimation a very 





important problem for all components of the Department of 
Defense. 

To combat this problem, the Department of Defense has 
issued directives to employ independent parametric cost 
estimation. Publications such as Reference [1] have appeared 
which give step by step methodology for the development of 
a parametric cost estimate. 

Regression problems faced by costing and pricing analysts 
in these situations are inherently difficult for two fundamen- 
tal reasons [Ref. 2]: 

(1) The number of observations 1s usually small compared 
with the number of system characteristics which are 


candidate components of the regression equation. 


(2) The available data is not produced by employing an 
efficient experimental design. 


Under these circumstances, it has been shown that the use 
of variable selection techniques may result in regression 
equations which yield inflated ae values whose statistical 
Significance cannot be tested using the F-test.+ 

In general parametric cost estimation, an analyst should 
not blindly trust the regression equation resulting from his 
analysis. To measure the 'goodness of fit', the analyst can 


2 and F. (As noted earlier, however, 


use such statistics as R 
regression models for cost estimation often do not allow use 


of F.) There are few hard and fast rules for assessing the 


Iohis 1s the case when Ro iS net significant, and some, 
pee not all, of the 8. are significant. 





usefulness of such a model. This is especially true of models 
that result from the application of a variable selection tech- 
nique in order to obtain a 'best' prediction equation. The 

R? Statistic in these situations may not give a meaningful 
indication of the model's applicability. 

The purpose of this paper is to investigate the coeffi- 
cient of determination, fe used in best subset regression 
analysis. The solution algorithms presented in this paper 
provide a practical method to help assess the confidence 
placed by the empirical model builder in Ee for a regression 
upon a particular set of exogenous data. It may contribute to 


the theoretical foundation for the understanding of regression 


models, whose properties are not fully understood. 


B. STATEMENT OF THE PROBLEM 

Suppose the analyst selects n-independent observations on 
p-predictor (candidate) variables and one dependent variable. 
The goal of the analysis is to determine the k-variable re- 
gression equation which maximizes the coefficient of deter- 
mination for various values of k. The difficulty with this 
analysis is assessing the statistical significance of Ro EOL 
a given value of kK. 

The n-independent observations, mathematically, span a 
n-dimensional finite vector space (call it Ee"). ‘The regres- 
Sion procedure projects the dependent variable, Y, onto sub- 
Spaces within E" looking for the best fit (prediction). How 


the subspaces are oriented in gE” therefore dictates the quality 


10 





of fit that can be obtained. The subspaces for E’ are deter- 
mined by the candidate predictor variables; each predictor 
variable's observed values representing a vector in Feeand 
each combinatorial set of the p-predictor variables spanning 
a subspace. Obviously, there are ie possible k-variable pre- 
diction equations. Therefore, the analyst's selection of the 
p-variables to be used as candidates determines which sub- 
Spaces are available for the regression procedure to consider 
in best subset selection.” 

Wallenius [Ref. 3] in trying to gather information on the 
Maknown distribution of Re asks, "How well do the ue) subspaces 
Spanned by all the subsets of columns of X ‘fill' E>?" (Where 
X is the (n x p) matrix of n-observations on the system's 
p-~characteristics.) In other words, using the candidate pre- 
dictor variables selected by the analyst, will the highest R* 
value obtainable through regression differ very much from the 
worst possible? 


Wallenius characterized this problem mathematically by 


Meraning the "coefficient of fill’ (COF) as follows: 
ee (X,k) = min max z 

min ‘**’ (4 i} AY: (x, Xp pee eX, ) 

= 1 wake el 2 7K 


This formula can be interpreted to ask, "If given some set 


Mampeandidate predictor variables, what is the worst Y that 


-The total number of exogenous variables that Y is 
regressed upon 1s p. 


ala 





can be predicted?" Where ‘worst' can be identified by the 
lowest Re value obtainable. Thus, a lower bound for Re is 
also obtained by answering this. 

The problem is extremely difficult to solve directly. This 
paper proposes an algorithm for solving this problem using 
Serimization with a surrogate apnee ee function. The number 
of Optimizations required to obtain a global solution is 
exponentially related to p (the number of predictor variables). 
Thus, in the area of p = 14, enumeration begins to become 
economically infeasible as a solution technique, and hence, 
an algorithm 1s proposed to search the area about some mini- 
mum point. This latter algorithm cannot guarantee a global 
solution, so it must be considered local in nature. Whether 
such a local solution is useful requires further research; 
usefulness may be directly dependent upon the empirical nature 


cf the data. 


ah 





i eno COREG CONCEPTS 


A. GENERAL LINEAR REGRESSION USING LEAST SQUARES 
A general and frequently used linear model is the ‘'multi- 
ple linear regression modei'. It can be represented as 


follows: 


Vena oe + 84X54 ~ 84X55 =P docy Te Ce, A calle 

The variable represented by y is the variable of interest 
(1.e., to be predicted). The variables represented by the Re 
are associated with y and may influence the behavior of y. 
Thus, mathematically y is called the dependent variable 
(endogenous) and the x variables are called independent varia- 
bles (exogenous). Statistically, this model is referred to 
as the regression of y on the x variables. The coefficients 
8 are referred to as 'partial regression coefficients’ and 
they specify the linear functional relationship between the 
independent variables and the dependent variable. Mathe- 
matically, the B are the partial derivatives of the functional 
relationship nae Thus, a es indicates the change in the 
dependent variable y corresponding to a unit change in the 
independent variable Bs (all other independent variables held 
masced) . 

There are various criteria used in regression, Nowever the 


formulation of interest for this paper is based upon the least 


8 





Squares criteria. Use of least squares to derive the formula 


for estimating Y is shown below. 


Using matrix notation, the regression model can be written 


ao 


where 


[< 


(a) 


2 im : 


ig dad nm 





X 1s the n x (m+l) matrix of n-observations on 
m-independent (x) variables plus a dummy variable. 
¥ 1s the column vector of the n-observed values of Y. 


8 is the column vector of the m+l partial regression 


Nw 


Se ee ous (wmercas 315 the Veeror Of estimated 


regression coefficients). 


14 





Formulating the least squares 
min ee = (¥-x8)?(y-xB) , 


mrerng the derivative with respect to 6 


SeI¥"¥ - 2¥7xR + B™X™XB] = 0, 


merc nen Solving for 8, we get the estimate 


Aw 


gB = (x?x)7*x*y . 


B. STANDARDIZATION AND NORMALIZATION OF VECTORS 

The regression coefficients of the linear model are func- 
tions of the units of measurement of the variables. The 
magnitudes of coefficients are influenced by choices of units 
of measurement. Thus, a tantamount scaling problem to that 
experienced in linear programming exists. This scaling prob- 
lem is avoided by use of ‘standardized regression coefficients'. 

Standardized regression coefficients are the end result 
when the variables which they are estimated from have been 
transformed to unit variance. 

Consistent with previous notation, and for use further on 
in this paper, an alternative form to obtain 'standardized' 
predictions of the dependent variable vy for all 1 (repre- 


eemecd as Y*) is: 


1S) 






eee yor vy = (vy =¥) 
Y. ry. 
(a) (*) denotes 'standardized'. 
(b) 1 denotes a (n x 1) column vector of ones. 


Of great importance is that a linear transformation has 
been performed, and that the intercept of the regression 
equation is zero (i.e., the equation passes through the ori- 
gin). Each partial regression coefficient indicates how many 
standard deviation changes in y are associated with one standard 
deviation change in the corresponding x (all other a held 


mixed). Also, a mathematical characteristic is that 


Or, equivalently, 


Mathematically, normalization is a linear transformation 
that takes any given vector and converts its length to unit 
length (length in multidimensional vector spaces discussed 
in the next section). A vector of unit length is said to have 
a norm = 1. Any arbitrary vector can be transformed into a 
Mie VECtor by dividing it by its norm. 


A vector with unit length can be depicted as follows: 


16 





C. LENGTH, ANGLE, AND COSINE FUNCTION IN MULTIDIMENSIONAL 
VECTOR SPACES 


Using the concept of inner products, length or magnitude 
of a vector can be defined. In this context, the length is 
referred to as the ‘norm’ of the vector. 

The norm of the arbitrary vector (X)Xo7-++,%,) in R” 


is denoted by 


Bp 7X5 5--- x) | | = V(X) Xp 02  X)(K) Ky ree X) 


foemactrix notation, 


le 


i 
[x 
IX 


|| XH | 


Thus, a normalized vector would be created as follows: 


[>< > 
i 
[>< 
= 
| 
- 
43 
ae 
f- 
SS 
NO 


To obtain the cosine of the angle between two vectors, 
one normalizes each vector and then takes their inner product. 
Pt a and v are vectors in Ree the cosine of the angle can 


be defined as: 


_ 1 1 
oe a Tvl] ” 


Two vectors are orthogonal if and only if their inner product 
is zero. This means the angle between them is 90°, cos § = 0, 


end v-u = Q. 





D. PROJECTIONS 

Let u and v be vectors with angle a between them. The 
scalar projection (or component) of v in the direction of u 
is defined to be ||v||cos a. Geometrically, it can be visu- 


: 5 
alized as in figure one. 





livileos & 


Figure 1. Scalar Projection of V onto U 
Alternatively, computation is eaSier if written as: 
u 


scalar projection of v onto u = v_- TTaTT 
° ; {Uy | 


Mis 1s the inner product (dot product in E") of v with the 


meee Vector in the direction of u. 


“Note that the dashed line is perpendicular to u. 
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The vector projection is the scalar projection times the 
mae Vector in the direction of u. So, the vector projec- 


mem OL vy onto u is written: 


projv = IIvileos a = Trapp = * Tar Tarr 
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IIIf. PROBLEM FORMULATION ONE 


Recalling from Chapter II the expression for estimating 


the regression coefficients, 


there is a unique solution for 8 1f the matrix xt x is non- 
wemgular, that is, if it 1s full rank. This is the case if 
X has n-independent columns, since the columns of eax and 
the rows of X span the same space (Ref. 4). 

Mimics se given a (ml x p) matrix X Of Eull rank and n < p, a 
finite dimensional vector space E" can be defined where n- 
linearly independent column vectors span the space (i.e., form 
a basis). Extending linear algebra concepts to the linear 
regression model of Section II.A, a geometric interpretation 
memenat kK columns (where k < n) of the X matrix span a k- 
dimensional subspace in the n-dimensional space ne Puls cher, 
the least squares procedure, through 'best' subset selection, 
will select amongst the @ columns of X a k-dimensional sub- 
Space of E* to predict the vector of dependent variables (Y) 
such that cos“¢ = R- 1S Maximized (8 iS minimized); where 8 
Bemene angle between Y and its orthogonal projection onto a 


candidate subspace. 


20 





Pero omaeeMeEnee OF “COBFFICIENT OF FILL’ (COF) 

Consider the matrix X. Assume it has rank =n. Next, 
meqtire tide YY, the vector of dependent variables be obtained 
through standardized regression coefficients (see Section II.B); 
maatc is, 


* 


ae — (x - 
x 


I*<| 


) 


This requirement causes no loss in generality since, 


2 2 


ASC CL 5 ) ~ Be ee ax 


he k? 


Burcher, by requiring wae to be a unit vector (normalized), a 
Seeger ey tS Specified (1.¢., alli other vectors in the direction 


Papowill be a scalar multiple of the normalized vector); that 


is 
a 
The 'coefficient of fill' (defined in Section I.B) now becomes 
[Ref. 3]: 
“en (&k) = Min Max Rs 
“min! rs ona ee) 
ae 1 eam < 1 Sere 
Vee 


Ose 





B. TRANSFORMATION OF THE COF TO A QUADRATIC FORM 

Wallenius [Ref. 3] showed that the Re perutoneor the COF 
can be transformed into a quadratic form. Such a transforma- 
tion can be done as follows. 

Identify arbitrarily any combination of the ) columns 
of X. Let the qe such combination be called D,. Recall that 


the columns of D; define a subspace in BE” and any vector in oa 


can be projected onto that subspace. Assume that D. has rank k. 


x 
YA 
\ 
% 
\ 
\ 
AS 3 
x 
Sub o{o Cc 2.¢ . my 


Figure 2. Subspace Spanned by D 


From linear algebra (vector projections) and figure two, it 


follows that 


; ae * ™ a” 
Lege = PFOy. 4 = Cy Xi) X; 


i 


lay 


1 


mw 


where Rye ee  Xy TS an Oocenonormal basis for D.. 
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From least squares regression, an alternate calculation 


for 2. is as follows: 


as * 
Z. = p.(p'p.)7tply* . 
al. 1 a? ke 1— 
Recalling that 
— * 
= ({p'p.) “piv . 
—1 <a i 


and that the ‘generalized inverse’ of a matrix is 


eae Calcuiation of Zz, can be Sraeecuds - 


In Section II.C, the cosine of the angle between two vec- 
* . 
memeeewas defined. To calculate the angle between Y and its 
orthogonal projection Zee use 


* 


: ea Ae 
= -i 


Gos te = ——— i ae 
. Hee) 2a 


Through substitution and simplification, this can be written 


as 


2 





# — x 
Y °D,D.Y 
cos oe | 
Yt] 
Becalling from Section II.B that ee = 1, and squaring both 


Sides, cos“8, can be written as follows: 


- _ Oe 3 
2 ise, | |= vy 'p7p'p. py 
eos oe — Se . =e : iit 
ot 
(2 
maus, 
2 2 
Ry* sD. = cos 35 


a 

Matrix algebra then allows the following conversion for 
2) ; 
7D. 
= 1 


a 


Oo 
O 
O 

il 


[(D,D.) 


iD 


2 : . 
Ry *.p. aa Be DS 


* s =. 
Y = YY bab) ¥ 


we can write 
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Thus, the COF can be re-expressed as: 


aie * 
min max Y BY (GOFQ) . 


ile 

fee A GAME INTERPRETATION OF THE COFQ 

The COFQ can be viewed as a game between a person and 
nature. The person tries to choose the B. Mathixenat will 
maximize Ree and nature plays the part of an antagonist who 
Meese co Create Che least favorable Y to be predicted by 
that B.. 

The game is visualized as depicted in figure three. For 
fPeeecaer, the regression “black box® (labled '1") determines 
the 8 whose GG@eriicrents express Y as a linear combination 


of the 'best' subset of independent variables (columns of 


matrix X), D.- 





Figure 3. Game Interpretation of COFQ 


ZS 





Nature upon seeing the subspace represented by the matrix 


B Mieagonmascicalbly chooses the worst Y in E’ so as to thwart 


aM 
the regression model's validity. This is represented in figure 
three as the box labled '2". 

At first glance, this view of the problem hints that there 
exists an iterative solution to the COFQ. Movement eneas 
such a solution might be measured by the generation of a se- 
quence of successively lower values of Re for the COr@.  How- 
over, OOtCaining an optimal Y 1s dependent on whether the sequence 
generated 1s convergent. Using Zangwill's general convergence 
theorem, it can be shown that this process will not generate 
a convergent sequence [Ref. 5]. For every B. selected, nature 
fieeesainad a vector ¥ orthogonal to that corresponding sub- 
Space. It is the case that in a finite dimensional vector 
Space, for any subspace with rank less than the vector space 
itself, there exists a vector orthogonal to that subspace. 
This is shown in figure 4 for 3-dimensions, where the subspace 
formed by x4 Bide iS On Cdimenozon 2. VECtor Y is normal to 


zy 
the KX rXo1 plane. 


D. FORMULATION AS A NON-LINEAR PROGRAMMING PROBLEM (NLP) 

FOR OPTIMIZATION 

In Section III.B, it was shown that the COF has an equiva- 
lent representation using a quadratic form for the objective 
muRction. The matrix B. in the COFQ form is a square symmetri- 
cal matrix of size (n x n), and has rank k. In general, a 


quadratic function F(x) has the form 
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Piqusesc.— VO reMogonal EO subspace 


exe) = =X" GX + Con + Q 

Berea constant matrix G, column vector C, and scalar a 
mettiplication by 1/2 is included in the quadratic term to 
avoid the appearance of a factor of two in the derivatives). 
The quantity ‘'G' is referred to as the Hessian matrix of F, 
which is the matrix of second partial derivatives. Thus, B. 
can be characterized as the Hessian matrix of a quadratic form. 

me todeling of the COFO in Stages 


The solution procedure for the COFQ can be broken down 


into several stages. 
Teese) Paoe1Culan 8.) Gptimize the quadratic form 
* 
Semrind the minimum Y 


(b) Use an algorithm for optimal subset selection to 


look at some regions only and thereby avoid optimizing over 


Zo 





all oD possible B,- This is based upon the reasoning that 
some Bs (along with the associated constraints to be specified 
in the next section) will define convex regions which are 
larger than others and that the larger regions will produce 
a Y which is ‘worse’ than smaller regions will. Amongst those 
regions over which optimization was done, select the y" corres- 
ponding to the smallest EC value as the global minimum. These 
steps are addressed in the remainder of this chapter. 
2. Modeling as a NLP 

This section addresses stage (a); optimizing with a 
mabeicular Bs to find the minimum ae Each Optimal y" corres~ 
ponding to a B. is a local optimum for the overall COFQ problem. 

Modeling this stage as a NLP (non-linear programming) 


problem, the objective function becomes: 
* * 
min Y "BAY - 


From Section III.A, the constraints require Meee De 70 faint 
Pgen ana that Y be obtained through use of standardized 
regression coefficients. Assume the latter is met, hence the 
use of y PiereusOnrey. Ee unit Length requirement was 


stated as 


xT x 


|< 
tt 
fo 


and the latter constraint was represented by 
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The NLP now becomes: 


Note that the y; are free variables, 


Zation requirement, 


could not be met. 
The characterization of this NLP 
a non-linear equality constraint, 


as a NEP [Ref. 6]. 


which the Hessian is the identity matrix. 


* 
Giemlecally optimal Y , the optimization 
peng a quadratic surface at unit length 


n-dimensional hypersphere. (Recall from 


intercept of the regression equation is zero.) 


visualized in 3-dimensions as depicted in figure 5. 
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otherwise the standardi- 


HewEhatyor a NLP with 


referred to in the literature 


The constraint 1S a quadratic form for 


in=searching Lor 
search must move 
tromerne Origin of a 
section 12.8 that the 
This may be 


This 








Figure 5. Movement of Y along Quadratic Surface 


constraint requirement to move along a quadratic surface 
Meeaees a non-convex NEP, which correspondingly increases the 
optimization complexity. Search direction methods for uncon- 
strained nonlinear optimization are nonapplicable. Further, 
reduced-gradient and gradient-projection methods developed for 
problems with nonlinear constraints will probably fail due 
to the non-convexity. 

The requirement to solve many NLPs as the B. are changed 


necessitates an optimization procedure that is general in nature. 
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IV. PROBLEM FORMULATION TWO 


The inherent difficulty with formulation one prompted a 
search for a computationally simpler model. Bv investigating 
the geometrical representation of the COF, could a linear 
model be built that, when solved, would solve the original 
problem? If an exact solution to the original problem was 
not reached, could an approximation be found that meets the 
‘practical need' for the COF value as expressed in the Intro- 
duction of this paper? 

Such a linear model could utilize the extensively known 
results in linear programming; especially capitalizing on 
the speed of computation using existing linear programming 
algorithms. 

PesamLlat mechodology as was used in Section III.D to 
break the COFQ problem into stages, will be used in the solu- 
tion approach to the COF. One stage 1s optimizing to find the 
minimum Y's locally through the use of a surrogate objective 
function. The second stage is either determining the global 
minimum ae through enumeration, or selecting the local ee 
most attractive as the answer (thereby approximately solving 


the COF) .7 


d a . 
“Enumeration seems economically feasible into the neighbor- 
hood of approximately 14 candidate predictor variables. 


Bak 





A. GEOMETRICAL INTERPRETATION 


Let a’ = (A, ,-+++78,) be a non-zero vector. Consider the 
meeeors X that satisfy 


| 

1 
ll 
Qu 


Mmemoome scalar dG. The set of X that satisfies this is de- 
memea tO be a hyperplane. The vector a is termed the normal 


to the hyperplane, and the normalized vector 


aC 


which has Euclidean length unity, is said to be the ‘unit 
normal’ to the hyperplane. 

One can think of a hyperplane as a shift from the origin 
Semene (n-1)—-dimensional subspace orthogonal to a [Ref. 7]. 
This can be seen in figure 6. Note that if d = 0, the 
hyperplane (subspace) passes through the origin. This can 
be seen for three dimensions in figure 7. 

Recall from Section I.B that the column vectors of the 
Matrix X (without dummy variable) used in linear regression 
define a n-dimensional finite vector space called E™. Fur- 
ther recall that combinatorial sets of k such vectors span 
Subspaces of E’, I£ the assumption of linearity is valid for 
the regression model, then interpretation of such subspaces 


as being linear is valid. Thus, let each normalized column 


2 








Figure 6. Hyperplane in 3-Dimensional Space 
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Figure 7/7. Hyperplanes through Center of a Sphere 
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vector of X represent the ‘unit normal' to a (n-1)-dimensional 
hyperplane. Next, view each such hyperplane as a constraint 


of the aoe 


ane =. 0d = M2 yo s.« 7 D 


where 1 denotes the gee column of the data matrix of n-obser- 
vations on p-predictor variables, and x? = (Xp pXoye--7Xy) - 
Thus, the general equation for the Mae constraint can be 


written as follows: 


where 


@ = 0 for i = 1,...,p- 


Collectively, these p constraints (or hyperplanes) inter- 
sect within the hyperspace and form convex polytopes; specif- 
ically, cones. An elementary example of this is shown in 
figure 8. Any vector in ae SEtaginating ae Ene Origin (to 
Biemuce Y, Ene vector of interest) will geometrically lie 


within some such cone as defined by a set of hyperplanes. 


A. notation switch has taken place to enable an easy 
transition to commonly used linear programming notation. 
This X does not represent the predictor variables, but rather 
real numbers to be determined. 
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Figure 8. Convex Regions Formed in 2-Dimensions 


B. ALGEBRAIC MODEL 
Consistent with linear programming notation, the p- 
constraints defined by their unit normals can be written as 


follows: 


444%] + a1 2%> - AinXn = Q 
A51%] - Ax 5% - aa ao, a = Q 
eee - eee + 2... Ut Aon*n = 


This system of constraints can be put into the standard 


£Orm AX = b, where 5 = 0. Matrix A is the matrix of 
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coefficients and is actually the transpose of the normalized 
column vectors from the independent (predictor) variable data 
matrix. 

Beengecne concepes Of Nalfspaces, Specifying < or > in 
each relationship above determines on which side of each con- 
straint a point in gE’ lies. Putting the system of constraints 
into standard form will then require a slack (Surplus) varia- 


ble for each constraint. 


Seem CHARACTERIZATION OF THE CONES 

Since the regression model projects the dependent varia- 
ites y Onto the subspaces (hyperplanes), we want to know for 
eeeoime(VeCeor) Y Which hyperplanes are the closest to Y. 
This can be rephrased as asking, "Within which cone (convex 
polytope) does Y lie?" 

Consider some arbitrary point (i.e., vector originating 
meom Origin) ke = (X) 1 Xo7--- 7X2) in the hypersphere. Further, 
assume that the points are constrained in the distance they 
can be from the origin, so as to not have an infinite ray. 


Now, scanning out from it can be seen that some hyper- 


Ags 
planes are 'closer' to Xo than are others ('closest' defined 
by the smallest angle a5 between Xo and any vector in a 
particular hyperplane). Define the set of hyperplanes which 
are closest to Xp when considering all directions as ‘bounding 
hyperplanes', and the region bounded as the cone within which 


Xo lies; refer to such a cone as ‘hole Hi’. Hie speciric 


sizes and quantity of holes created by the intersecting 
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hyperplanes depends on the values given in the problem data. 
As we scan out from Xoe we know for certain that the closest 
hyperplane is one of the walls. Note, however, that we can- 
not extend this argument to identify the other walls of the 
hole in which Xo lies. Thus, the question remains in identi- 
fying the walls of the hole (bounding hyperplanes). The 
concepts for reduction of linear inequalities presents an 
answer to this question [Ref. 7]. 

lL. Reduction of Inequalities 

Suppose for any arbitrary point X in a (subject to 

pepper bound) all given hyperplanes are examined. Then 
there exist hyperplanes ‘exterior’ to the convex region 
defining H. These exterior hyperplanes, when viewed as 
constraints are non-binding. As an elementary example of 
mis, a hole Hs 1s depicted in figure 9 corresponding to the 


following system of constraints. 


a,X <p ge hyperplane Py 
si 

a5x > 0 hyperplane P. 
a 

aX ea O hyperplane P3 

ib. Se Se! 


Pomecimebemseen LEOm ElGure 9, the third constraint is 


exterior to H, (the hole identified by the letter i) and can 








Figure 9. 2-Dimensional Depiction of Non-Binding p 


be termed 'redundant'. This redundant constraint can be 
eliminated without changing the solution set. 

Now, suppose a convex region is specified by adding 
Slack variables, and putting the system of equations in 
Smee a form. Thus (assuming artificial variables have 


been forced out), we get: 


aX + an 0 
-a5X srs = 0 
-a3X +s, = 0 

Se Eesha 
S > 0 


BX) 





In standard form, this constraint redundancy is reflected in 


that s, is a 'non-extremal' variable and hence it, together 


3 
ween the third constraint, can be eliminated. This special 


example generalizes to higher dimensional problems. In 
general, redundant inequalities show up’as having non-extremal 
slack variables. 

Here the L.B. is any arbitrary positive real number 
sufficiently big to avoid roundoff problems, yet to prevent 
convergence at the origin. The following algorithm, using 
Simplex techniques, will identify the binding constraints 
(walls) for any hole. 

a Omen Removal Of Redundant Constraints 

Assume that some combination of ‘less than or equal 
to' and 'greater than or equal to' inequality signs for the 
p constraints are specified. Call this system of p inequali- 
ties the 'resource constraints' and put it in standard form. 


a. Find a feasible solution: min ) A,, where A, 


1s the artificial variable Me Ais to the 
deed constraint.° 

bees S= USi Soy 9S): the set of slack varia- 
bles. 

¢. Select S. Geom oi 


Srailure to find a feasible solution for that combination 
See. ,<) Signs implies that a convex region is not defined, 
and therefore that combination can be ignored. 





Graee ian Si) subject to resource constraints and 
variable bounds. 
oracle S; is non-extremal (Ss, > 0), place index i 
in set R. 
f. If set S has been exhaustively examined, go to 
Step g; otherwise increment i and go to Step c. 
Ge reneve orwveix! alllvconstraints is.t. i¢« {RI}. 
The result of this is that the boundaries of the 
convex region within which a specified point (vector) lies 
have been identified. Any optimization need only consider 


this subset of the resource constraints. 


D. OPTIMIZATION FOR LOCAL MINIMA 

mieeeee Seeks a ¥ such that ae is minimized. For a given 
Yo and the corresponding cone within which Yo lies, this 
fenomum 25 approached as Y moves away from the nearest hyper- 
plane. Recall that as the angle of projection increases, R? 
decreases. However, past a certain position, the angle of 
feemieeeton between Y and a different hyperplane will decrease 
enough such that the regression will select this second hyper- 
Pee CO project Y onto instead, since a higher R? value can 
be obtained. 

PPeemesmpmeceSSuwere ito occur between Yeand each of the 
walls of the hole simultaneously, an ‘equilibrium point' for 
meee be reached. This equilibrium point is defined to be 


the center of the hole. This can be visualized in 3-dimensions 


ioecicgure 10. 
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origin 


Aeure 0, Vector Y at Equilibrium Point 


Pearoetemen tS equilibrium point where if Y were projected 
Smea any Of the walls, that the angles of projection would be 
Siem wOrst ; we are maximizing the minimum angle. The Y 
representing this point, and normalized to be unique, is the 
fem Y LO De =predrecced using regression for this region of 
er . 

Milemscanen 2Or a local minimum Y thus becomes a search 
for a vector that originates at the origin and passes through 
mee center Of Ehe hole. Considering the hole in the context 


of a convex region, we are searching not for a solution at an 


extreme point, but rather at the center of the region. 
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Vicvwingecie previous figure from ‘topside’, we see Y as 


a point in the center of a (n-1)-dimensional convex region. 


Figure ll. Top View of 2-Dimensional Convex Region 


Such an equilibrium point can be approximated by optimizing 
using slack associated with the extremal constraints repre- 
senting the hole. The objective function for optimization 


can be formulated as 


max [min cl x] ; 
== 
x aL 
where Cc. Tommelucdced ston alla ¢ {RR}. Note each Cc. contains 


only one non-zero coefficient--that coefficient representing 
mepe Slack variable for the oe resource constraint. The final 
Beeecained 1s the optimal value of Y. 

Mic lorrginal fonum Of the COF required yiy = iene.) una 
length), and that for the COFQ this requirement created a 


non-convex NLP. In figure 12 for 3-dimensions, it can be seen 


iets tnere GXists a hyperplane ‘tangent’ to this quadratic 
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Figure 12. Hyperplane Touching Surface at One Point 


surface such that the closest point on the hyperplane to the 
Grlgin is the optimal Y . Then the vector y is the orthogonal 
unit normal to the hyperplane. 

Now, let the initial value of Y (Y 1S normalized) specify 
mmeeunding lyperplane tO the hole. For each new value of Y 
a new bounding nyperplane can be specified, thereby linearly 
approximating the quadratic surface in successive increments 
fom moves towards the center of the hole (optimal Y). 

Eaing Maerax A with the understanding that it contains only 


the extremal resource constraints, the optimization problem 


can be formulated as 


max [min Cc. x] 
Xx 4 
pte + I = 
S.t. AX Sa 0) 
aXTy < 4] 





S. > 0 


Xx free 


where a = values of X for the (k-1) © jowissaeve sterol, Eligks! MC BEC 

Boeeseiced Dy the current value cof X. For each hole, the 

macial x is the normalized value of the BFS to the L.P. 

which was reached using the algorithm in Section C.2. The 

values of Y for the ‘igs =e iteration become the coefficients 
k 


oie Mey (Kth 1teration). Note that ub 1S normalized between each 


iMeecation. The quantity ‘'I' is an identity matrix and Sa 1s 


the column vector of slack variables corresponding to the 


extremal resource constraints (hence the use of subscript A). 


This problem can be rewritten in final form as the L.P.: 


max 2Z 
s7t. Aer IS. = 0 
Ai , 
Z < C.X He ne tl 
ax < 1 
> 
noe tree. 


E. OPTIMIZATION FOR A GLOBAL MINIMUM 
Recall from Section IV.B that the direction of the inequality 


Signs for the resource constraints determines which convex 
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set is the feasible region. We can see that there are 2P 
such sets to consider for local optimization in the search 
Zot~ a global minimum to the COF. 

When the number of candidate predictor variables is small, 
say up to about p = 14, enumeration is economically practical. 
For problems with p larger than this, only holes which look 
larger than most of the other holes could be optimized as 
‘candidates' for the global minimum. Such a selective opti- 
mization would require some type of global information prior 
Bemany Optimization. A procedure to find such information 
was not found. This impasse led to the algorithm presented 
below. This algorithm, when given a local optimum, searches 
adjacent holes for a better minimum. If none better is found, 
BomersOps, Otherwise, 1t uses the best hole and continues to 
search from there. In theory it might search all 2P holes, 
Piemcnis 1S doubtful for real world problems (also, an itera-~ 
tion stop could be put in if desired). Note that the solution 
is still local in nature, although it may be the global solu- 
tion. A heuristic approach may have to be developed to decide 
which hole to use to begin such a search. 

1. Algorithm: Searching the Neighborhood about Y Min 

An algorithm to search the neighborhood about a local 
iermum in an effort to find the global solution, and if not, 
to find a better minimum (larger 2 value), is as follows: 

aoe Hee. Some compination Of inequality signs (<,;>) 


for the resource constraints. 
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Get a BFS (basic feasible solution), remove 
redundant constraints, and then maximize Z 

(Z defined in previous section). 

Record extremal constraints as set E (along with 
the direction of the inequalities), and record 
the value of x" aes» Zi. 


h constraint of E, reverse the inequality 


th 


For the Kt 
(multiply the constraint by -1). Let J = {k 
constraint, reversed} u {all resource constraints - 
oy. Optimize over J as in Step b. If Za Ea Ae 
record as in Step c (use label other than £), 

and set 'NEXT' = k. 

If all elements of set E have been exhaustively 
examined, go to Step £f; otherwise, increment k 

and go to Step d. 

If no Z for set E is better than the initial Z, 
SL@peee@rnierwise, lets = {constraint 'NEXT'} 

u {all resource constraints -'NEXT', reversed}. 


Fix 'NEXT' so as to not reverse its inequality 


Sign again. Go to Step b. 


This results in a minimum that although is local, 


can be considered the best solution possible in that region 


of the hypersphere. 


Using the algorithms presented and the solution ob- 


tained, the analyst can compute the corresponding Ra value 


by forcing the regression or using the equations in Section 
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IiiI.B. Thus, he now has a relative measure of his model's 
applicability through comparison of the R? value pertaining 
to his prediction equation (obtained through least squares 
regression), to the lowest Re Geiarnaole for that set of 


candidate predictor variables. 





V. SUMMARY 


The empirical model builder, in utilizing x for a measure 
of ‘goodness of fit', needs information concerning the quality 
of this statistic. This paper has addressed this problem 
utilizing optimization techniques to help the model builder 
assess the amount of confidence that can be placed ina Re 
value pertaining to a particular set of candidate predictor 
variables. 

The linear programming algorithms presented offer a 
practical, fast, and cost effective methodology to search 
the hyperspace in which the linear regression model will 
operate. The lowest value of a (globally) achievable for a 
particular set of cost data can be found when the number (p) 
of candidate predictor variables is small. Whereas, when 
the number of variables nears fourteen or more, a local mini- 


mum must be accepted due to computational costs inherent in 


the presented methodology. 
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